High-Performance Matrix-Vector Multiplication on the GPU

نویسنده

  • Hans Henrik Brandenborg Sørensen
چکیده

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrixvector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid format for better performance of sparse matrix-vector multiplication on a GPU

In this paper, we present a new sparse matrix data format that leads to improved memory coalescing and more efficient sparse matrix-vector multiplication (SpMV) for a wide range of problems on high throughput architectures such as a graphics processing unit (GPU). The sparse matrix structure is constructed by sorting the rows based on the row length (defined as the number of non-zero elements i...

متن کامل

Effective Sparse Matrix Representation for the GPU Architectures

General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under con...

متن کامل

Sparse-matrix vector multiplication on hybrid CPU+GPU platform

Sparse-matrix vector multiplication(Spmv) is a basic operation in many linear algebra kernels.So it is interesting to have a spmv on modern architectures like GPU. As it is a irregular computation CPU also performs compares to GPU. So it is interesting to have this routine in hybrid architectures like CPU+GPU.So we have designed a hybrid algorithm for Spmv which uses a CPU and a GPU. We have ex...

متن کامل

Development for Parallel Hopfield Neural Network Implemented for GPU

This paper presents some new implementations of parallel Hopfield neural network model -Cauchy machineon Graphic Processor Units (GPU). The main operators in the parallel Hopfield neural work are loaded into RGBA textures so that input calculation, output update and terminal condition check are implemented by fragment processors on GPU. In addition, the matrix-vector multiplication is realized ...

متن کامل

Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011